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ABSTRACT 


Queuing Network Models of a computer system operating 
with a single workload type are presented. Programs 
which operate on the Texas Instruments SR-5P program- 
mable calculator are included. 
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SOME QUEUING NETWORK MODELS OF COMPUTER SYSTEMS 


Queuing network models provide a basic tool for under- 
standing computer systems and predicting how they will perform. 

The use of networks of queues to describe what is going 
on inside the computer is a relatively old idea, but its wide- 
spread application to practical problems has only recently taken 
hold. In September of 1978 the /CM devoted a special issue of 
Computing Surveys to Queuing Network Models of Computer Systems 
Performance. The issue contains eight outstanding articles: the 

editor's overview, three tutorials, three application notes and an 
assessment of the field of analytic modeling. The excellent tu- 
torial by Denning and Buzen [1] provides a point of departure for 
this paper. 

In an earlier paper by this author [2] , conventional 
Markov modeling techniques were used to develop a simple model of 
n terminals dealing with a single server system. A program for 
the Texas Instruments 8R-52 programmable calculator was presented 
in that paper. The very compact algorithms presented in the tu- 
torial by Denning and Buzen provided the inspiration to attempt 
more complex models on the SR-52. programmable calculator. Four 
programs are presented in this paper. They provide a capability 
to handle a large number of closed network, single workload 
problems . 
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In modeling terminology closed systems are systems in 
which there is a limited population of jobs; they are called 
closed because jobs don't enter and leave but continue to cir- 
culate within the system. Most real computer systems deal with 
limited job populations because there are limited facilities for 
handling jobs; interactive job populations are limited by the 
number of terminals attached to the system; batch jobs may be 
limited by available job input storage space; both are limited 
during execution by fixed amounts of main memory or software 
imposed multiprogramming limits. Thus models of closed systems 
are most appropriate to handling these real system environments. 

The computational requirements for network queuing mod- 
els increase with the complexity of the system being modeled. The 
simplest ano easiest closed system models have two servers, a 
single workload and up to perhaps three jobs active; pencil, 
paper, and patience are sufficient computational resources to 
handle these models. 

For larger job populations - up to perhaps six or .seven - 
an inexpensive calculator with three memory registers can replace 
the pencil and paper. Here the limit - six or seven - is estab- 
lished by the stamina and dexterity of the analyst. The job pop- 
ulation can be arbitrarily large and be accommodated on a pro- 
grammable calculator with as few as 10 memory registers and 200 
program steps. (This was shown in [2].) In this paper, still 
dealing with an arbitrary job population and a single workload, 
the central server system may consist of up to six separate 
devices ... or five devices one of which may have a load dependent 
service time. (The load dependent service time function is re- 
stricted to a simple function of the number of jobs in the queue.) 
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This size of problem can be handled with the 20 registers and 224 
program steps available on the SR-52. This central system model 
requires two memory locations per device plus seven or eight loca- 
tions for other variables and indices. 

The marvelous thing about all of this is that the algo- 
rithm developed by Buzen (and used in these programs) implicitly 
enumerates all of the system states which can occur for n jobs 
visiting k devices, and solves the associated equations. A system 
state is any unique distribution of the number of jobs at each de- 
vice in the system. The number of ways n jobs can be distributed 
among k devices is given by the expression: 

, _ (n + k - 1) ! 
n! (k - 1) i 

The result for a central system with five devices and a 
population of 20 jobs, is 10,626 states. Solving the resulting 
10,626 linear equations by brute force techniques would require 
tens of thousands of memory locations to manage the problem. With 
Buzen’s algorithm (and a modest twist added by this author) any 
single workload problem can be handled with two locations per de- 
vice plus about eight overhead registers. (Note: The main bene- 

fit of Buzen's fast algorithm is the reduction in numbers of 
arithmetic operations required to enumerate and solve the equa- 
tions. From the viewpoint of storage the algorithm Buzen de- 
scribes actually requires one location per device plus one loca- 
tion per job plus overhead. The twist added to further compact 
the required storage is to evaluate the matrix row by row instead 
of column by column. On the SR-52 this means an unlimited job 
population can be handled with i maximum of six devices.) 


3 



Much more powerful and sophisticated tools are required 
to handle multiple load dependent servers, multiple classes of 
jobs, and a variety of queue service disciplines. The BEST/1 
program offered by BGS Systems and the CADS program offered by 
Information Research Associates are two such tools; they require 
tens of thousands of memory locations for instructions and data 
space, . also they run on large scale computer systems. 

In today's world of programmable calculators the Texas 
Instruments SR-52 has been replaced by the TI-59. It provides 
roughly twice the capacity for the same price. The programs 
presented in this paper can be easily converted for use on the 
newer TI-59. This newer calculator provides sufficient space to 
tackle some simple two-workload problems and will be the host for 
future model developments by this author. 
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THE CLOSED SYSTEM MODELS 

Four programs have been developed to aid in the analysis 
of closed queuing networks. 

1. Batch model with homogeneous service times 

2. Batch model with one load dependent server 

3. Interactive model with up to five devices 

4. Interactive model with a load dependent central 

server 

The two programs for batch models will be discussed to- 
gether since there are only minor variations between the two. 

Then the interactive models will be presented. 

THE BATCH MODELS 

In order to introduce nomenclature and demonstrate how 
these may be used a sample problem approach is taken. Figure 1 
illustrates five servers in a batch processing system. At the 
bottom of the figure is a table showing the average job's char- 
acteristics. The typical job visits the swap device one time per 
job and requires 0.8 seconds to swap the job in. The job visits 
both the CPU and the channel 100 times; once for each disk input/ 
output. Disk 1 gets 70% of the traffic. Disk 2 gets 30%. The 
service time per visit is shown for each device. The numbers 
which are needed in the model are the total service times for the 
job at each device, Y k = V k S k . The CPU at 4 seconds of total ser- 
vice carries the heaviest load and will be the device which ul- 
timately limits throughput. 


5 



Figure 1 

Sample Problem - A Batch Processor 



JOB CHARACTERISTICS 


DEVICE 

NAME 

DEVICE 
NO. k 

NO. OF 
VISITS 

V k 

TIME PER 
VISIT 

s, 

TOTAL SERVICE 
Y,, SEC. 

Swap 

2 

1 

-|\ 

.8 

.8 

CPU 

1 

100 

.040 

4.0 

Disk 1 

3 

70 

.030 

2.1 

Disk 2 

4 

30 

.030 

.9 

Channel 

5 

100 

.012 

• 

ro 


a 








The Buzen algorithm fills in numbers in a two-dimensional 
matrix G. Columns in the matrix correspond to devices in the sys- 
tem and rows to the number of jobs. Elements of the matrix are 
computed from the adjacent elements, above and to the left, as 
shown in the figure below. Initially the first row contains I's 
and the first column contains 0’s. 


DEVICES 


0 

0 

1 0 

2 0 

JOBS 

n-1 0 

n 0 


N-1 0 

N 0 


1 

1 


2 ... k — 1 k . . . K 

1 111 



g 

y 


(n-1 , k) 
k 


g(n,k-l) g (n, k) 


G (N-1 , K ) 
G(N,K) 


Each element is commuted as follows: 
g (n , k) = g (n,k-l) + Y k g(n-l,k) 

where the Y k multiplier is the service time of the job at 

device k. 
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At the end of the computation the quantity G(N,K) is 
found . * This is the normalizing constant for the product form 
equations where all devices have homogeneous service times. That 
is, the service time of the device is the same regardless of how 
many jobs are waiting in the queue. The rightmost column of the 
matrix contains the complete series of normalizing constants from 
G (1 ,K) through G (N , K ) . The performance measures of interest are 
functions of these normalizing constants and the device service 
times . 


System 

Throughput 


X (N ) 


G (N-l ,K ) 
G(N,K) 


Utilization 
of Device k 


G (N-l ,K) 


V N > - Y k ^TnTTTT 


Mean Queue 
Length at 
Device k 


- E 

n= 1 


n G (N- n , K ) 
Y k G (N , K ) 


Service Time of ~/ M n = _J 

an Equivalent X(N) 

Load Dependent 

Server 


An alternative way of calculating the mean queue length 
is given l; the following recursive formula: 

Q k (N) = U k (N) (1 + Q k (N-l) ) 

This method is particularly useful because one storage 
location per device is all that is needed to accumulate the 
mean queue length for an unlimited job population. The other 
expression implies storage fcr the complete column of n values of 
G(n,K) . 

* Note on Nomenclatures : In this paper g(n,k) denotes an inter- 

mediate value in the g matrix and G(N,K) is the final value 
corresponding to N jobs and K devices. Similarly h(m,k) and 
H(M,K) denotes intermediate and final values in the h matrix for 
interactive systems. 
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Batch Model With Homogeneous Service Times 

The program for the batch model with homogeneous service 
times will handle up to six devices and any number of jobs spec- 
ified by the user. Its short name is Batch HST-6. 

The model is used where the service times for all devices 
are homogeneous. 

The program is a straightforward implementation of 
Buzen's algorithm. Due to limited storage space the mean queue 
length is computed only for device #1. The following points cover 
inputs, outputs, and controls for the program: 


Inputs to the model 

• number of jobs N 

• device number V 1 - f, ) k 

• device service times Y 


Outputs in order of presentation are: 


• number of jobs 

• mean queue length at device 1 

• normalizing constant 

o throughput with N jobs 

• mean job service time 

• up to six pairs of: 

- device number 

utilization 

• 99 indicating end of output 


N 

Q 

G(N,K) 

X(N) 

S(N) 

k 
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Input Controls - N, k, Y j. plus RUN 


These three controls are located on function keys A,B C, 
respectively. Depressing the key interrupts program 
execution and displays the current value of the variable. 

• insert a new value if required 

• hit RUN to confirm your input acicion 


N ote: k 


is a dual purpose input. 

It indicates which device time, will be 
input next during input operations. 


• It indicates the highest numbered device K 

to be modeled during execution. 


Execution Controls - EXEC, RES, RUN 


EXEC Executes the program starting with an ini- 

tialization of all required registers. The 
program will run until results are to be 
presented for a load of N jobs. EXEC is on 
function key E. 

RUN The program halts ana displays its outputs in 

the preset order indicated above. Run is 
used for two purposes: 

... to obtain the next display in the cycle 
2. at the end of che output cycle depressing 
RUN will continue the operation increasing 
the load to N + 1 without having to 
compute from scratch with a new EXECUTE. 
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RES 


Resume is a special :ontro] which will 
continue the computation of g{n,k) without 
starting from scratch. It is intended to 
provide a shortcut around the logic which 
computes and displays the utilization 
statistics. In can be safely used at any 
point in the output cycle to advance tc the 
next level of 1 oad. 

Batch HST-6 has two main uses. The first, and most 
obvious, is to use it to model a batch processing system. Its 
second purpose is to model any subsystem of up to six devices, in 
order to obtain the schedule of service times for an equivalent 
load dependent single server. An example of its use in this role 
will be given in the description of the interactive model with a 
load dependent central server. 

Recall again the sample problem in Figure 1. The CPU 
portion of the job is the largest component. The CPU will tend to 
be the limiting device so we assign it to device #1, to obtain the 
mean queue length. 

Table I shows the results of running the program for job 
populations N = 1 through 5. Reading down each column the results 
appear in the order which the program produces them. In the out- 
put routine the device utilizations are output as a pair of num- 
bers: first, the device number, then the utilization at that de- 
vice; only the utilizations appear in Table I for each column. 
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Table I 


Batch HST-6 Results for Sa mple Problem 


Jobs N 


1 

2 

3 

4 

5 

Queue 1 (CPU) 


.444 

.997 

1.65 

252^ 

2.40 

3.23 

G(N,K) 


,9.0 

52.15 

1111 

4684 

Throughput X(N) 


.111 

.172 

.207 

.226 

.237 

Service Time S(N) 

9.0 

5.79 

4.83 

4.41 

4.21 

Utilizations: 







5 - Channel 


.133 

.207 

.248 

. 272 

.285 

4 - Disk 2 


.100 

.155 

.186 

.204 

.214 

3 - Disk 1 


.233 

.362 

.435 

.476 

.498 

2 - Swap 


.089 

.138 

.166 

.181 

.190 

1 - CPU 


.444 

.690 

. 828 

.906 

.949 

The performance 

of 

the CPU 

is the 

main 

limiter 

in the system 

because the work is 

so 

CPU heavy. With 

five 

jobs ac 

tive the CPU 


will be almost 95% busy and on the average there are 1.2 jobs at 
the CPU. 

The appendix provides a listing of the Batch HST-6 Program for 
the SR-52. 
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Batch Model With One Load Dependent Server 


The second program is a minor variation on Batch HST-6 which 
allows one of the devices to be a load dependent server. When 
load dependent service is introduced at one of the devices the 
Buzen algorithm, slightly modified, can still be used to determine 
the throughput of the system. The device queue lengths, however, 
are no longer simple functions of the normalizing constants, 
G(N,K), and algorithms more complicated than can be easily handled 
on the SR-52, are required to compute these performance 
quantities. 

In the modified program three registers are used to specify a 
simple model of the load dependent server. (In Batch HST-6 two of 
the registers were used for device #6 and one was used to accu- 
mulate the device #1 Queue length.) The net result is a program 
that can handle five devices. Device II is the load dependent 
server. The short name for this program is Batch LDS-5. 

The load dependent server model is a simple function of the 
number of users in the device queue. Figure 2 illustrates the 
function,. Base service time, B is a constant service time the job 
experiences up to the load at which the inflection point occurs in 
the function. Beyond the inflection point load, L, the service 
time per job increases by the increment amount, I for each 
additional user. 
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Figure 2 



Stated another way: 

For n < L Y (1 , n) = B 

For n > L Y(l,n) = B + (n - L) I 

where: 

¥(l,n) = service time at device #1 with n in queue 
B = Base service tine 
L = Load at the inflection point 
I = Increment per job in queue 

The modification to Buzen's algorithm is simply to create the 
elements in column 1 by multiplying the previous row's value by 
the appropriate Y(l,n). For the device 1 column: 

g(n,l) = Y(l,n)g (n-1,1) 
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The remaining rows and columns of the matrix are formed in the 
same way as previously described. At the end of the matrix com- 
putation G (N-l ,K) and G(N,K) are available. These allow the fol- 
lowing to be easily computed: 


System 

Throughput 


X(N) 


G(N-1,K ) 
G(N,K) ' 


Service time of an S(N) 

equivalent single server 


1 _ 

X(M) 


For devices with homogeneous service times the utilizations 
can be computed from the relationship: 

Utilization at device k U^Y^XCN) 


The following narrative covers the inputs, outputs and 
controls for the Batch LDS-5 program: 


Inputs to the model 



number of jobs 


N 


device number (1 - 

5) 

k 


device service times 

Y k 


base service time 


B 


load at inflection 

point 

L 

fj 

increment per job 


I 
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Outputs in order of presentation are 

« number of jobs 
« device 1 service time 
© normalizing constant 
o throughput with N jobs 
o mean job service time 
» up to five pairs of: 
device number 
utilization 

<» 99 indicating end of output 

Input Controls - N, k, Y j plus run 

The'.; e chree controls are located on function keys A,B,C, 
respectively. Depressing the key interrupts program 
execution and displays the current value of the variable. 

e insert a new value if required 

• hit RUN to confirm your input action 

N o t e : k is a dual purpose input. 

o It indicates which device time, Y., will be 
input next during input operations. 

• It indicates the highest number of devices to 
be modeled during execution. 

Input Controls for Load Dependent Server- B,L,I plus RUN 

The parameters B,L and I are inserted as a group using 
function key D and the RUN key. Operation is as follows: 


N 

Y(l,n) 

G(N,K) 

X(N) 

S(N) 

k 
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Depress function key D labeled B,L,I 
Current value of B is displayed 
Insert new value if desired and depress RUN 
New value of B is displayed 
Depress RUN 

Current value of L is displayed 
Insert new value if desired and depress RUN 
New value of L is displayed 
Depress RUN 

Current value of I is displayed 
Insert new value if desired and depress RUN 
New value of I is displayed 

Execution Controls - EXEC f RUN 

EXEC Executes the program starting with an initializa- 

tion of all required registers. The program will 
run until results are to be presented for a load of 
N jobs. EXEC is on function key E. 

RUN The program halts and displays its outputs in the 

preset order indicated above. Run is used for two 
purposes: 

1. to obtain the next display in the cycle 

2. at the end of the output cycle depressing RUN 
will continue the operation increasing the load to 
N + 1 without having to start from scratch. 

Batch LDS-5 has the same main uses as Batch HST-6, with the 
addition of a single load dependent server. It can be used to 
model a batch system or to model a subsystem of up to five devices 
in order to obtain an equivalent load dependent single server. 


Once again let us use the sample problem of Figure 1. This 
time we will introduce load dependent service on the CPU to see 
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how it may affect the performance of the system. We will use the 
4.0 sec of CPU time as the base service time. Beyond a load of 
two in the queue, the service time will be increased one second 
per job in the queue. That is to say: 

B = 4.0 L = 2 1=1.0 

Table II shows the results of running the program for job 
populations 1 through 5. The format of the table is similar to 
Table 1; the queue at device 1 is not computed or presented. The 
device 1 service time with n in queue is presented in the second 
row. 


Table II 

Bat c h LDS-5 Results for Sample Problem 


Jobs 

N 

1 

2 

3 

4 

5 

Y(l, 

n) service time 

4.0 

4.0 

5.0 

6.0 

7.0 

G(N, 

K) 

9.0 

52.15 

268 

1415 

8398 

Throughput X(N) 

• 111 

.172 

.195 

.189 

.168 

Service Time S(N) 

9.0 

5.79 

5.13 

5.29 

5.93 

Util i zat i ons : 






5 - 

Channel 

.133 

.207 

.234 

.227 

. 202 

4 - 

Disk 2 

.100 

.155 

.176 

.170 

.152 

3 - 

Disk 1 

.233 

.35 2 

.409 

. 397 

.354 

2 - 

Swap 

.089 

.138 

.156 

.151 

.135 

1 - 

CPU 

.444 

.590 

.974* 

1.135* 

1 .18 


Comparing Tables I and II, one can see the effects of the 
load dependent CPU. In the first two columns of the table the 
performance measures are, of course, the same; the CPU service 
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time is still the base value of 4.0 seconds. At the load of three 
jobs there is a slight loss in throughput compared with the first 
case study. Scanning across throughput now one finds the maximum 
occurs at three jobs on the system, beyond three jobs act’*-e the 
increased CPU time per job has a larger negative effect than the 
usual positive effect of adding more jobs to the multiprogramming 
set. This sort of thing can happen in overloaded systems, the per 
job device time may increase in heavily used systems; for example, 
as a result of increased competition for memory, more page fault 
interruptions may be required - resulting in more CPU and disk 
service time per job. The BLI function is used to represent 
increased system overhead past the threshold of thrashing. 

One final note, the CPU utilizations marked with an 
asterisk in the table are the ones reported by the program. They 
are not correct because the CPU is a load dependent server. These 
utilizations were computed by multiplying maximum service time by 
throughput. The true value of the utilization of the load de- 
pendent server lies between this upper limit and a lower limit 
computed as the product of minimum service time and throughput. 

The algorithm for computing utilizations and queue lengths for the 
load dependent server would exceed the space available 'n the 
SR-52 . 


The appendix provides a listing of the Batch LDS program. 

Of particular note is the load dependent server model located at 
program steps 091 through 115. This section of the program can be 
changed to create other load dependent models. Registers 16, 17, 
and 18 contain the variables B, K, and I; these may assume dif- 
ferent meanings or usage in a different load dependent server 
model. Any substitute function should start at the same location. 
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091, and place the appropriate value of service time in Register 
01 prior to LABEL C, currently at location 117. Note that a com- 
plete]/ arbitrary load dependent service time schedule can be 
entered dynamically by replacing the current LDS routine with a 
halt and display of the current value of Register 01. If the user 
changes the register to a new value it will be t..e next one used. 
The required routine would be: 


RCL 01 
HLT 
STO 01 

Using this routine it is possible to model any number of devices 
by breaking the system into device groupings of 4 to 6 devices. 

For example a 10 device system could be modeled as a six device 
subsystem plus four individual devices. First Batch HST-6 is run 
to obtain a schedule of load dependent service times. Then Batch 
LDS-5 is run using the subsystem service times for device 1 and 
the remaining four devices as two thru 5. This is an exact method 
of combining multiple devices. 

One user of this program has noted that the BLI function is 
an approximation to what happens during thrashing. He reports two 
additional subsystem approximations that he has found i\ eful. For 
a P processor multiprocessor, Y(l,ri) = S^* min (n,P; ' is a rea- 

sonable form for an approximation. B = 1 represents an ideal 
multiprocessor. With B < 1 various amounts of multiprocessor mu- 
tual interference can be modeled. For an I/O subsystem a power 

"B 

curve fit of the form Y(l,n) = A* min (n,V) provides a good ap- 
proximation. In this case V is an arbitrary maximum value which 
is specified by the user. TI program STI-09 from the statistics 
library is handy for determining A and B. 


20 



THE INTERACTIVE MODELS 


The tutorial by Denning and Buzen also presented a very 
compact algorithm for handling interactive systems. Again we will 
use a sample problem approach. Figure 3 illustrates such a sys- 
tem. It has M terminals connected to a central system. Each of 
the terminal users has an average think time Z. The central sys- 
tem has the same five devices and associated service times as the 
previous batch cases had. (This will facilitate using the batch 
results in solving the interactive systems.) For the case study 
we will use a think time of 10 seconds and terminal populations 
M = 2, 4, 6, 8, 10. 

The M terminals represent M jobs in the system as a whole. 
Actually some number N are on the central system at any given time 
and M-N jobs are out at the terminals. The terminals are treated 
as a single subsystem whose service time is Z/n when there are n 
users thinking (i.e. jobs at the terminals). The terminal sub- 
system is thus a load dependent server. The devices in the 
central system all have homogeneous service times. 

The algorithm, attributed to Williams and Bhandiwad C 3 1 , is 
quite similar to the Buzen algorithm described previously. The 
interactive algorithm fills in a two dimensional matrix h; the 
columns correspond to k devices; and the rows correspond to the m 
terminals. Elements of the matrix are computed from the adjacent 
elements, above and to the left as shown in the figure below. 
Initially row 0 and column 0 contain l's. 

Each element of the matrix is computed as follows: 
h(m,k) = h(m,k-l) + mY^/Z h(m~l,k) 
where Y^ is the service time of the job at device k (Y^ = V^S^) • 
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CHANNEL 



JOB CHARACTERISTICS 

NO. OF TIME PER 


DEVICE 

DEVICE 

VISITS 

VISIT 

TOTAL SERVICE 

NAME 

NO. k 

V k 

_.s k 

Y,. SEC. 

Swap 

2 

1 

.8 

— K - 1 — 

.8 

CPU 

1 

100 

.040 

4.0 

Disk 1 

3 

70 

.030 

2.1 

Disk 2 

4 

30 

.030 

„9 

Channel 

5 

100 

.012 

1.2 
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DEVICES 


T 

E 

R 

M 

I 

N 

A 

L 

S 


0 11 
1 0 
2 0 


m-1 0 

m 0 


M-1 0 

M 0 


k-.l k ... K 

111 


. h ( m- 1 r k ) 

| ” Yk/z 

• > • 

h(m,k-l) h(m,k) 

H(M-1,K) 
H (M, K) 


At the end of the computation the values H(M-1,K) and H(M,K) are 
available. These allow the following performance measures to be 
computed . 


Central system idle 
probability 


P(0) 


1 

H(M,K) 


Throughput 


X (M) 


M . H (M-1 , K ) 
Z H(M,K) 


Response time 


R(M > - xrm - 2 


Mean active load 0 = M - 7 X(M) 

Because the devices of the central subsystem are homogeneous 
the utilization of each device is simply the product of Y^, the 
service time, and X(M), the throughput of the system. 



Interactive M o del With Homogeneous Service Time 


The program for the interactive model with homogeneous 
service times will handle up to six device! and any number of 
terminals specified by the user. The short name of this program 
is Interactive HST-6. 

The program is a straightforward implementation of the 
interactive algorithm and provides all of the performance para- 
meters indicated above. The following cover inputs, outputs, and 


controls for 

the program. 


Inputs to the 

model 


• 

nymber of terminals 

M 

• 

device number (1 - 6) 

k 

• 

device service times 

Y k 

• 

think time 

Z 

Outputs; i" 

der of: presentation are: 


• 

number of terminals 

M 

• 

normalizing constant 

H (M,K) 

e 

system idle 

P(0) 

• 

throughput with M terminals 

X(M) 

e 

response time 

R(M) 

• 

mean jobs in system 

Q 

• 

up to six pairs of: 



device number 

k 


utilization 

U k 

• 

99 indicating end of output 
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Input Controls - M, k, Y j. plus RUN 


These three controls are located on function keys A,B,C, 
respectively . Depressing the key interrupts program 
execution and displays the current value of the variable. 

• insert a new value if required 

• hit RUN to confirm your input action 

Note; k is a dual purpose input. 

• It indicates which device time will be 
input next during input operations. 

• It indicates the highest number of devices to 
be modeled during execution. 

Execution Controls - EXEC, RUN 

EXEC Executes the program starting with an initializa- 

tion of all required registers. The program will 
run until results are to be presented for a load of 
M terminals. EXEC is on function key E. 

RUN The program halts and displays its outputs in the 

preset order indicated above. Run is used for two 
purposes; 

1. to obtain the next display in the cycle 

2. at the end of the output cycle depressing RUN 

■ will continue the operation increasing the load to 
M + 1 without having to start from scratch. 
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The sample problem in Figure 3 is a simple variation on the 
original batch problem of Figure 1 - the job characteristics on 

the central system are the same in both cases. The difference is 
in the way jobs are introduced to the system; the original case 
had N jobs always present, in this case study M terminals intro- 
duce the jobs to the system after a think time of 10 seconds. 
Table III shows the results of running the program with loads of 
2, 4, 6, 8, and 10 terminals. 


Table III 

Interactive HST-6 Results 


M Terminals 

2 

4 

6 

8 

10 

H (m, k) 

3.843 

19.56 

139.8 

1454 

22009 

P ( 0 ) - system idle 

.260 

.051 

.007 

.0006 

.00004 

X (M) throughput 

.099 

.170 

.214 

.236 

.246 

R(M) response time 

10.2 

13.46 

17.9 

23.8 

30.7 

Q avg jobs in system 

1.01 

2.29 

3.85 

5.63 

7.54 • 

Utilizations: 






5 - Channel 

.119 

.20 

.26 

.28 

.29 

4 - Disk 2 

.089 

.15 

.19 

.21 

.22 

3 - Disk 1 

.207 

.36 

.45 

.50 

.52 

2 - Swap 

.079 

.14 

.17 

.19 

.19 

1 - CPU 

.395 

.68 

.86 

.95 

.98 

The minimum response time occurs 

when 

only one termina 

active (not shown in 

table) ; 

the response time for 

one user 

simply the sum of the 

service 

times on 

each 

of the 

devices, - 


seconds. As the terminal load increases, the throughput of the 



system rises rapidly at first and slower later on as the system 
approaches its saturation limit. In both the batch and the inter- 
active cases this limit is established by the CPU component of the 
workload. At 4 CPU seconds per job, the throughput limit will be 
1/4 - .25 jobs per second. At a load of 6 terminals the through- 
put is roughly 86% of this limit. Response time is roughly twice 
what it would be on a dedicated system. Adding more terminals 
will make response time worse with little gain in throughput. 

A comparison of the interactive and batch cases raises an 
interesting question: 

At a terminal load of 4 users there is an average of 2.29 
terminal users in the central system and the throughput is .170 
jobs per second. This is less than the .172 jobs per second 
throughput c ; "he batch system with two jobs active. One might 
have expected that with more jobs active (2.29 is greater than 2), 
that the system throughput would be greater, not less. I don't 
know why this is so. 
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Interactive Model With Load Dependent Central Server 


In the tutorial Denning and Buzen point out that the central 
system portion of an interactive system can be modeled as a single 
server with a load dependent service time. The article does not 
describe the algorithm for computing the performance quantities, 
but it: is a simple variation on the interactive algorithm present- 
ed in the previous section. For the load dependent central ser- 
ver, the matrix h can be viewed as a simple one column matrix; the 
single column represents the single load dependent server. The 
service time of the central system under a given constant load of 
n jobs, S(n), is equal to the reciprocal of the throughput for a 
system with n jobs and the terminal visit shorted out. 


One can think of such a system as a batch system with n jcbs and 
use the Batch HST-6 or Batch LDS-5 programs, as appropriate, to 
calculate the schedule of load dependent service times. 


For a system with M terminals successive elements h(m) of 
the single column matrix h are computed by the following recursive 
formula; 

h(m) = 1 + J" . 

where; 


h (0) 

= 1 



m is 

stepped from 1 

up to M 

Z is 

the think 

time 


S(n) 

= service 

time 

with n jobs active. 
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Note: The recursion takes the service time schedule in the 


reverse order to increasing m. That is, S(M) is the first service 
time in the recursion, S(l) is the last. 


At the end of the recursion H(M) is found. A second pass of 
the recursion is made with a terminal load of (M-l) terminals to 
find H(M-l). This value H{M-1) is not the same as the value of 
h(m-l) found on the previous pass of the recursion. H(M-l) 
considers the service times S(M-l) thru S(l) in its recursion. 

The h(m-l) of the previous recursion used 5(M) thru S(2). The two 
values H (M) and H(M-.l) are used to find the following performance 
quantities : 


System idle 
Throughput 

Response time 

Mean queue length 


P(0) 

X(M) 


1 

= H (M) 

_ M . H ( M— 1 ) 
Z ~ H (M) 


R(M) 

Q 


JL_ _ z 

X(M) 

M - Z X (M) 


Also, starting from the value for P(o) found above, one can 
find the probability, P(n), of there being n jobs in the system 
from the following recursion: 


P (n) 


(M-n-H ) S (n) 
Z 


P(n-l) 


The program for computing the performance quantities for an 
interactive system with a load dependent central server is called 
Interactive LDCS-1. The program implements the algorithm and com- 
putes the principal performance quantities described above. Due 
to lack of sufficient program storage on the SR-52 the recursion 
for calculating p(n) has not been included in Interactive LDCS-1. 
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Also due to register storage limitations no more than eleven 
values of S(n) can be handled. This limits the effective degree 
of multiprogramming of the system to serving a maximum of eleven 
terminals simultaneously on the central system. 

Actually the limit of eleven degrees of multiprogramming is 
not a serious one because in most real systems the throughput 
changes attributable to operating above the degree of multi- 
programming of eleven are usually so slight and improbable of 
occurrence that they can be neglected. 

The Interactive LDCS-1 program offers an additional para- 
meter setting called the multiprogramming limit, N. In some real 
systems the size of main memory or possible operating system para- 
meters may limit the number of concurrent jobs that the system 
will consider ready for execution. When the level of multi- 
programming is set to N the program will consider the service time 
of the system to be a constant S(N) for loads greater than or 
equal to N. 

The rationale for modeling a fixed level of multiprogramming 
in this manner is treated in the article by Chandy and Sauer [4] 
on approximate methods. This is an approximation by use of flow 
equivalent methods for passive elements of the system. The pas- 
sive element in this case is memory which restricts the multi- 
programming to some level n which is less than the total terminal 
population m. The system has been collapsed from a multiple de- 
vice system to an equivalent load dependent single server with a 
schedule of service times S(N). For example, by only considering 
rates S(l), S(2), S(3) and then using S(3) instead of S(4), S(5), 
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... . S(m), one is effectively limiting multiprogramming to level 
3. That is to say the "improved" service times due to multi- 
programming at levels higher than three are denied by setting them 
to S (3) . 


The following points cover inputs, outputs, and controls for 
the program. 


Inputs to the model 

think time 
number of terminals 
multiprogramming limit 
load index 

load dependent service time 

Outputs in order of presentation are 

» number of terminals 

«> the matrix constant for M 

«> system idle 

• the matrix constant for M-l 

• system throughput 

• response time 

• mean number in system 

• 99 indicating end of output cycle 

Input Controls: 2, M, N and RUM 



Z 

M 

N 

n 

S(n) 


M 

H (M) 
P(0) . 
H(M-l) 
X (M) 
R(M) 

Q 


These three controls are located on function keys A, B, and 
C respectively. Depressing the key interrupts the program and 
causes the current value of the variable to be displayed. 
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insert a new value if required 

depress RUN to confirm your input action 


I nput Contois n, S(n) and RUN 


. Function key D is labelled n, S(n) and is used with RUN to 
enter the schedule of load dependent service times S(N). 


« 

ii 

» 

<» 

<» 


depress function key D 

value n is auto incremented and displayed 
enter different value of n if desired 
depress RUN 

service time, S(n), is displayed 
enter different service time if desired 
depress run to confirm 

repeat until all values of S(n) are entered 


Execution Controls - EXEC, RUN 


EXEC Executes the program starting with an init- 

ialization of all required registers. The 
program will run until results are to be 
presented for a load of M terminals. EXEC is 
on function key E. 

RUN The program halts and displays its outputs in 

the preset order indicated above. RUN is 
used for two purposes: 

1. to obtain the next display in the cycle 

2. at the end of the output cycle depressing 
RUN will continue the operation increasing 
the load to M + 1 without having to start 
from scratch. 
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Note: Computation of the performance parameters, except for the 
case where terminal load M=l, requires two consecutive passes 
through the recursive formula. On the first pass the terminal 
load should be set at M-l and run through the complete output 
cycle. [On this "primer pass" only H(M-l) and P(0) are guaranteed 
to be correct. H(M-l) is saved for the next pass.] After the " 99 " 
display at end of the cycle depress RUN. This will increment the 
terminal load from M-l to M and cause the second pass through the 
recursive formula. The output displays will be correct for load 
M. 

Depressing RUN at the end of any cycle executes the next 
pass and provides results for the next higher terminal load: 
i.e., M-H, M+2, ... etc. 

Once again we turn to the sample problem in Figure 3. We 
are interested in studying the performance of the interactive 
system over a range of terminal loads from 2 through 10. We, 
therefore, will need the schedule of load dependent service times 
for the corresponding batch system with the number of jobs, equal 
to 1 through 10. While we are at it, we might as well get the 
schedule of S(n) for the batch system variation in which the CPU 
had a load dependent service time. (Recall that this will lead to 
reduced throughput at higher multiprogramming levels.) Table IV 
shows the load dependent central server schedules, S(n), for the 
two batch systems. 
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Table IV 


Load Dependent Central Se rver Schedules 



Constant CPU 

Load Dependent CPU 

N Jobs 

Y:CPU System S(n) 

Y (1 ,ri) CPU System S(n) 


1 

4 sec 

9.0 sec 

4 sec 

9.0 sec 

2 

4 

5.79 

4 

5.79 

3 

4 

4.83 

5 

5.13 

4 

4 

4.41 

6 

5.28 

5 

4 

4.21 

7 

5.93 

6 

4 

4.11 

8 

6.91 

7 

4 

4.05 

9 

8.04 

8 

4 

4.03 

10 

9.19 

9 

4 

4.02 

11 

10.32 

10 

4 

4.01 

12 

11.42 


In addition to being 

a schedule of 

S (n) inputs for the 


model. Table IV is interesting in its own right. The columns 
headed by Constant CPU show the CPU time and S(n) from using Batch 
HST-6. Similar columns under Load Dependent CPU were calculated 
using Batch LDS-5. In both cases the CPU is the heaviest com- 
ponent of the work load and the System Service time S(n) ap- 
proaches it asymptotically. What's interesting in the load de- 
pendent case is that S(n) dips below the CPU service time and then 
approaches it from that vantage point. It is also evident from 
the table that increasing the multiprogramming level in the con- 
stant CPU case beyond about four jobs will have very little 
payoff. For the load dependent CPU going beyond a level of three 
jobs is expected to hurt performance. 
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Table V 


Interactive LDCS-1 Results 


Terminals M 

2 

4 

fi 

8 

10 

H (M) 

3.84 

19.55 

139.7 

1450 

21913 

System Idle P Q 

.2*0 

.051 

.007 

.0007 

.00004 

H(M-l) 

1.9 

8.34 

50.0 

429 

538 3 

Throughput X(M) 

.099 

.170 

.214 

.237 

.246 

Response Time R(M) 

10.22 

13.45 

18.0 

23.8 

30.7 

Number in System Q 

1.01 

2.29 

3.85 

5.63 

7.54 

Table V shows 

the 

results of 

running 

the progra 

m for the 


constant CPU case. Except for minor roundoff differences, due to 
inserting S(N) to only three places, the results are the same as 
previously indicated in Table III. The results agree "exactly" 
when all quantities are entered to the maximum precision allowed 
by the calculator. 

A much more interesting set of results is found by 
examining the effects of multiprogramming level on the performance 
of the system with the load dependent CPU. Multiprogramming 
levels, N, of 1,2, 3, 4, were examined for terminal loads, M, of 1 
through 10. Table VI records the resulting throughputs and 
response times. 
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The best performance occurs at multiprogramming level 3, 
the third column of the table. This was expected because S(N) was 
a minimum at a load of 3. The rightmost column of the table shows 
how poorly the system will perform if no limits are placed on mul- 
tiprogramming [i.e. the multiprogramming level N is made equal to 
the number of terminals Ml. Here the best throughout is achieved 
at 5 terminals active because it is not until this point that the 
average number of ji/as in the system get up to around 3. 

Table VI 


Throughput and Response Time With Load Dependent CPU 


No ■ 

of Terminals 

1 

Multiprogramming 
2 3 

Level-N 

4 

M 

1 

Throughput 

.053 

.053 

.053 

.053 

.053 


Response 

9.0 

9.0 

9.0 

9.0 

9.0 

2 

X (M) 

.086 

.099 

.099 

.099 

.099 


R (M) 

13.3 

10.2 

10.7 

10.2 

10.2 

3 

X(M) 

.102 

.133 

.137 

.137 

.137 


R(M) 

19.3 

12.5 

11.9 

11.9 

11.9 

4 

X (M) 

.109 

.155 

.164 

.163 

.163 


R(M) 

26.7 

15.8 

14.4 

14.5 

14.5 

5 

X(M) 

.110 

.166 

.181 

.178 

.175 


R(M) 

35.2 

20.1 

17.6 

18.0 

18.5 

6 

X ( M) 

.111 

.170 

.189 

.186 

. 171 


R(M) 

44.0 

25.2 

21.7 

2 2.3 

2 5.1 

7 

X(M) 

.111 

.172 

.193 

.188 

.153 


R(M) 

53.0 

30.7 

26.2 

27.2 

35.4 

8 

X (M) 

.111 

.1 .'2 

.194 

.189 

.132 


R(M) 

62.0 

36.4 

31.1 

32.3 

50.4 

9 

X(M) 

.111 

.173 

.195 

.189 

.113 


R(M) 

71.0 

42.1 

36.2 

37.5 

69.6 

10 

X(M) 

.111 

.173 

.195 

.189 

.098 


R(M) 

80.0 

47.9 

41.3 

42.8 

91.6 
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The results from this case study are graphically pre- 
sented in Figure 4 as a family of performance plots. Additional 
multiprogramming levels not shown in the table have been added to 
show how the throughput varies for the range 4 through 10. Also, 
to allow comparison with a system which does not have the load 
dependent CPU, three additional plots are shown as dashed lines in 
the figure. These three throughput curves were generated using 
the S(N) schedule from Table IV labeled constant CPU. 

There is a lot of information conveyed by the figure. A 
few points will be made to illustrate what can be learned. A sys- 
tem without a load dependent CPU can be viewed as an "ideal 
system" because it does not require more system overhead per job 
to manac, > 10 jobs than to manage 2 jobs. The first four points 
discuss performance of this ideal system. 

1. The uppermost dashed curve labeled N=10 shows the 
"best" possible throughput for the ideal system. 
There is no "extra overhead work" which was modeled 
as a load dependent CPU. There is no practical 
limit on multiprogramming with the limit sat at 10. 
The system saturates at a throughput of .25 when the 
limiting device, the CPU, reaches 100% busy. 

2. Dashed curves labeled N=4 and N=3 indicate how 
throughput would drop due to limiting the level of 
multiprogramming. Main memory size is often such a 
limiter of multiprogramming. 
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Figure 4 

Throughput vs Terminal Load 
with Various Levels of Multiprogramming 

N = 1 




Key : 


Ideal System (Constant CPU) 


Realistic System (Load Dependent CPU) 
N Multiprogramming Level 



ro -JN 


3. Solid curves N=l,l and N=2,2 show the throughput 
for levels 1 and 2 for both the ideal system and 
the system with a load dependent CPU. Recall that 
the load dependent function didn't start increasing 
the CPU load until 3 jobs were in the system. 

4 . The large difference between N=1 and N=10 dashed 
shows the expected gains due to multiprogramming 
for the "ideal" system. Forty-four percent of the 
potential gain is achieved by going from level 1 to 
2. [Seventy percent by going from 1 to 3, 84% for 
going from 1 to 4.] 

In most real systems there is some amount of extra over- 
head involved with operating at higher levels of multiprogram- 
ming. Increased paging activity or increased swapping is such a 
form of load dependent behavior which could result in higher CPU 
activity for storage managment and page/swap support. The solid 
lines in this figure show a hypothetical system which is exhibit- 
ing realistic system behavior. The distinguishing character of 
the realistic throughput curve is that things get better up to a 
point where saturation occurs and then, if the load is increased, 
the throughput will actually get worse. Four points are made 
about the "hypothetical realistic" system. 

1. The solid plot for N=3 shows the best throughput 

for the system. The service time, S(N), with three 
jobs in the system is at its lowest so throughput 
will be best if mul ti prog ramming is at level 3. 
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2. The difference between the solid plot N=3 and the 
dashed plot N=3 indicates the difference between an 
ideal system and its "realistic" counterpart. 

3. Increasing the multiprogramming level from 3 to 4 
hurts performance a little. The difference between 
the ideal and realistic systems has increased. 

4. Increasing the multiprogramming level to five or 
beyond actually results in lowering the throughput. 
In all of these cases the throughput approaches a 
limit which is 1 - S(N). 

This is an example of paradoxical behavior which occurs 
from time to time. Conventional wisdom says increasing the multi- 
programming level is good. Conventional wisdom also says that the 
benefits of increased multiprogramming are progressively diminish- 
ing. Conventional wisdom does not predict that throughput will 
drop with increased multiprogramming, as this case seems to indi- 
cate. Paradoxically conventional wisdom is correct, if we .are 
trying to distinguish causal realtions’nips. The root cause of the 
poor performance is the increased overhead for storage management, 
modeled in this case by a load dependent CPU. Incrasing the mul- 
tiprogramming level merely allows the storage management problem 
to manifest itself. 
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SUMMARY 


Starting from the tutorial by Denning and Buien [and that 
is an excellent place for anyone to start] the algorithms for 
handling closed networks with a single job class were adapted for 
use on the SR-52 programmable calculator. Along the way it was 
found that by slightly altering the Buzen algorithm to process the 
G and H matrices row by row instead of column by column, that six 
devices and an unlimited job/terminal population could be handled 
on the SR-52. Techniques were also introduced for handling a 
simple load dependent server and for studying interactive systems 
with fixed multiprogramming limits. 

The paper provides listings of the four programs and a 

sample case study which can be replicated on the SR-52. 

Next on the agenda is conversion to the TI-59, addiui^nal 

load dependent servers, and some simple aids for approximating 

systems with parallel tasks. 
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